README for PPFN: A Dataset of Misinformation and Partisan Rhetoric on Philippine Facebook Pages
Submitted to ACL 2026 Findings | Harvard Dataverse

Overview
--------
This dataset contains 152,072 rows and 41 columns of data from Philippine Facebook posts collected from top political Facebook pages via CrowdTangle. The data includes both political and non-political posts, offering insights into social media activity, user engagement, and content characteristics relevant to the study of misinformation and partisan rhetoric in the Philippine online information ecosystem.

This resource is made available in conjunction with a paper submitted to ACL 2026 Findings:

    Herrera, M. J., Jaidka, K., and Luyt, B. Misinformation Commands Attention:
    An English-Tagalog Dataset of Political Discussions on Philippine Facebook Pages.
    Submitted to Findings of the Association for Computational Linguistics: ACL 2026.

Dataset Structure
-----------------
Number of Rows: 152,072
Number of Columns: 41
Language(s): English and Filipino (Tagalog)
Source: CrowdTangle (Philippine political Facebook pages)

Key Columns:

1. Page Information:
   - Page Name, User Name: Identifiers for the Facebook page or user.
   - Facebook Id: Unique identifier for the page.
   - Page Category: Type of page (e.g., MEDIA, PERSON).
   - Page Admin Top Country: Country where the page's admin is based.
   - Page Description: Description of the page.
   - Page Created: Date the page was created.

2. Post Metadata:
   - Post Created, Post Created Date, Post Created Time: Timestamps of post creation.
   - Type: Type of post (e.g., Photo, Video, Link).
   - Message: Text content of the post (English and/or Filipino).
   - URL, Link, Final Link: Links associated with the post.

3. Engagement Metrics:
   - Total Interactions: Total engagements (likes, comments, shares).
   - Specific reactions: Likes, Comments, Shares, Love, Wow, Haha, Sad, Angry, Care.

4. Video Specifics (if applicable):
   - Video Share Status, Is Video Owner?: Details about video ownership and sharing.
   - Post Views, Total Views, Total Views For All Crossposts: Video view metrics.
   - Video Length: Duration of the video.

5. Sponsorship Data:
   - Sponsor Id, Sponsor Name, Sponsor Category: Information about sponsored content.

6. Additional Details:
   - Overperforming Score: Measure of how a post performs relative to expected benchmarks.
   - Is Political: Boolean label indicating whether the post is political in nature.

Usage and Applications
----------------------
This dataset is particularly suited to NLP and computational social science research. Intended applications include:

1. Misinformation Detection: Identifying misinformation patterns in multilingual Philippine social media discourse.
2. Partisan Rhetoric Analysis: Studying the characteristics and reach of political versus non-political posts.
3. Code-switched NLP: Developing and evaluating models for English-Tagalog (Taglish) text.
4. Political Communication Research: Analyzing how political actors and media organizations engage online audiences.
5. Engagement Modeling: Training models to predict or explain post-level engagement metrics.

Sample Data
-----------
| Page Name   | Post Created        | Type         | Total Interactions | Message                                              | Is Political |
|-------------|---------------------|--------------|--------------------|------------------------------------------------------|--------------|
| Harry Roque | 2023-10-23 23:59:56 | Native Video | 131                | Reclamation still on-going: why? Di ba pinatigil...? | False        |
| Rappler     | 2023-10-23 23:49:35 | Photo        | 5,232              | Bea Alonzo looked back on her experience...          | False        |
| Rappler     | 2023-10-23 23:42:19 | Photo        | 3,321              | REST IN PEACE, BOBI. Bobi, the world's oldest dog... | False        |
| Rappler     | 2023-10-23 23:34:05 | Video        | 24                 | Manila summons China's envoy, Marcos orders a probe. | True         |

Important Notes
---------------
- Data Privacy: Ensure compliance with applicable privacy laws (e.g., GDPR, Philippines Data Privacy Act) when using this dataset. All data was collected through CrowdTangle, which provides access to public Facebook content.

- Data Cleaning: Some fields may contain null or inconsistent values (e.g., Final Link, Page Description). Preprocessing is recommended prior to use.

- Engagement Metrics: The Total Interactions field aggregates all reaction types and should be interpreted cautiously, as different reaction types carry distinct social meanings.

- CrowdTangle Deprecation: CrowdTangle was deprecated by Meta in August 2024. This dataset was collected prior to that date and cannot be refreshed through the same API. Researchers seeking to extend or replicate this dataset should consult Meta's Content Library as a potential alternative source.

Citation
--------
If you use this dataset, please cite the associated ACL 2026 Findings paper:

    Herrera, M. J., Jaidka, K., and Luyt, B. Misinformation Commands Attention:
    An English-Tagalog Dataset of Political Discussions on Philippine Facebook Pages.
    Submitted to Findings of the Association for Computational Linguistics: ACL 2026.

The dataset is hosted on Harvard Dataverse and should also be cited independently
per the repository's citation guidelines.